Goto

Collaborating Authors

 output distance


ECS -- an Interactive Tool for Data Quality Assurance

arXiv.org Artificial Intelligence

Based on this a wide variate The development of machine learning (ML) based systems of data quality properties is addressed. Be it the identification has led to a widespread use in research, industry as well as of single data points like outliers, false annotations in the everyday life. Even though ML systems show great or isolated data or the identification of groups of data performance in solving complex tasks, their use is mostly points like decision boundaries and local data point groups limited to domains, where wrong decisions only have minor of identical output. The ECS makes it possible to identify all consequences. The application of ML systems in high-risk data points which do not match specifiable conditions. The domains currently is problematic due to the needed quality, method itself is thereby created in such a way that interactions lack of trustworthiness and the expected legal basis. To give between the user and the data are supported in order to a legal framework for the application of ML systems the European simplify and speed up the quality assurance process. AI act (European Comission 2021) is at the moment under development.


On the Learning and Learnability of Quasimetrics

arXiv.org Artificial Intelligence

Our world is full of asymmetries. Gravity and wind can make reaching a place easier than coming back. Social artifacts such as genealogy charts and citation graphs are inherently directed. In reinforcement learning and control, optimal goal-reaching strategies are rarely reversible (symmetrical). Distance functions supported on these asymmetrical structures are called quasimetrics. Despite their common appearance, little research has been done on the learning of quasimetrics. Our theoretical analysis reveals that a common class of learning algorithms, including unconstrained multilayer perceptrons (MLPs), provably fails to learn a quasimetric consistent with training data. In contrast, our proposed Poisson Quasimetric Embedding (PQE) is the first quasimetric learning formulation that both is learnable with gradient-based optimization and enjoys strong performance guarantees. Experiments on random graphs, social graphs, and offline Q-learning demonstrate its effectiveness over many common baselines.


Text Similarity w/ Levenshtein Distance in Python

#artificialintelligence

In this article I will go over the intuition behind how Levenshtein distance works and how to use Levenshtein distance in building a plagiarism detection pipeline. Identifying similarity between text is a common problem in NLP and is used by many companies world wide. The most common application of text similarity comes from the form of identifying plagiarized text. Educational facilities ranging from elementary school, high school, college and universities all around the world use services like Turnitin to ensure that the work submitted by students is original and their own. Other applications of text similarity is commonly used by companies which have a similar structure to Stack Overflow or Stack Exchange.